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(54) Title: SYNTHETIC GENE FOR EXPRESSING ACTIVE RETROVIRAL PROTEIN IN EUKARYOTES 



(57) Abstract 

The present invention features a synthetic gene or region 
of a gene which has an amended codon usage compared with 
the wild-type gene and which is for the high level expression of 
a retroviral protein in eukaryotic cells, the expressed retroviral 
protein having enzymatic activity in the eukaryotic cell. In 
addition, the invention features a synthetic gene or region of 
a gene encoding a retroviral enzyme or part of a retroviral 
enzyme normally expressed in a mammalian or other eukaryotic 
cell wherein at least one non-preferred codon in the wild-type 
gene encoding the enzyme has been replaced by a preferred 
codon encoding the same amino acid. The retroviral protein 
may be a protease, reverse transcriptase, integrase protein or a 
polyprotein gag-pol precursor thereof. In one embodiment the 
retroviral protein with enzymatic activity is a lentiviral protein. 
In other embodiments the enzymatically active protein is a pol 
enzyme. In more preferred embodiments, the enzymatically 
active protein is a lentiviral integrase. In an even more preferred 
embodiment the enzyme is an HIV enzyme. In more preferred 
embodiments the enzymatically active protein is HIV integrase. 
The present invention also includes a detection method for 
intracellular integrase using a promoterless reporter gene. 
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SYNTHETIC GENE FOR EXPRESSING ACTIVE RETROVIRAL PROTEIN IN EUKARYOTES 



The present invention relates to the design of a synthetic gene for expressing retroviral 
5 proteins in eukaryotic cells especially mammalian cells as well as a synthetic gene, an 
expression vector containing the gene, eukaryotic cells stably harboring the gene, as 
well as methods of detection. 

Technical Background 

10 Retroviruses are diploid positive strand RNA viruses that replicate through an 

integrated DNA intermediate. Typically, retroviruses comprise a protein-containing 
lipid envelope surrounding a protein-encapsulated core carrying the viral genome. 
Within the infected cell the retroviral genome is reverse-transcribed into double 
stranded DNA by a virally encoded reverse transcriptase enzyme that is part of the 

15 retroviral particle. The particle also includes other enzymes such as integrase. Integrase 
is the virus-encoded enzyme that is responsible for inserting the viral DNA copy into 
the chromosome of the host cell, a process referred to as retroviral integration. (For a 
review see Brown (1997), in Retroviruses, Cold Spring Harbor Laboratory Press USA, 
pp. 161-203). Integration is an essential step in the replication cycle of the human 

20 immunodeficiency virus type 1 (HIV-1), the causative agent of AIDS (LaFemina et al. 
(1992), J. Virol. 66: 7414-7419). Since no human counterpart is known to exist, 
integration has attracted a lot of attention as a potential new antiviral target. However, 
integrase inhibitor development has suffered from the lack of a relevant cellular 
integration assay; integrase activity is typically evaluated using artificial 

25 oligonucleotide-based test tube reactions. There is therefore a need to provide an 
intracellular integration assay. 

Wild-type retroviral genomes contain at least three genes known as the gag, pol 
and env genes. The gag gene encodes internal core structural proteins, the pol gene 
encodes for certain enzymes such as protease, reverse transcriptase and integrase, and 

30 the env gene encodes the retroviral envelope glycoproteins, lntegrases from different 
retroviruses vary in size from 30 to 46 kDa, are encoded by the 3 '-end of the pol gene 
and are released from a gag-pol polyprotein precursor by proteolytic processing. The 
aminoterminal domain of integrase is characterized by a zinc finger (HHCC), is 

CONFIRMATION COPT 
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universally conserved among all retroviruses, and is essential for in vivo integration. 
The central domain is the most conserved region with an essential DD35E motif 
involved in catalysis. This portion can catalyze the disintegration reaction in vitro. The 
carboxy terminal domain is referred to as DNA binding domain and shows the least 
5 sequence conservation. This fragment is required for 3 '-end processing and integration. 
The active enzyme is thought to exist as a multimer wherein active domains can 
transcomplement inactive domains. 

Transient expression of avian sarcoma-leukosis virus (ASLV) integrase in COS 
cells has been obtained previously (Morris-Vasios et al. (1988), J. Virol. 62: 349-353). 
10 A mouse cell line stably expressing the integrase of Rous sarcoma virus (RSV) has also 
been reported (Mumm et al. (1992), Virology 189: 500-510). Expression levels were 
not specified but appeared rather low. The integrase (IN) of HIV- 1 has been expressed 
in Escherichia coli (E. coli) (Sherman and Fyfe (1990), Proc. Natl. Acad. Sci. USA 87, 
5119-5123), insect cells using baculovirus (Bushman et al. (1991), Science 249: 1555- 
15 1558), and Saccharomyces cerevisiae (Caumont et al. ( 1996), Curr. Genet. 29: 503- 
510). In yeast integrase expression proved to be toxic in cells defective in DNA repair. 
High level expression of HIV- 1 integrase in mammalian cells has remained elusive, in 
large part because expression of HIV-1 gag and pol proteins in general is Rev- 
dependent (Cullen (1992), Microb. Rev. 56: 375-395). In mammalian cells Rev- 
20 dependent expression of HIV-IN or H1V-FN fused to p-galactosidase or GFP has been 
reported previously (Faust et al. (1995), Biochem. Mol. Biol. Int. 36: 745-758; Kukolj 
et al. (1997), J. Virol. 71: 843-847; Pluymers et al., (1999), Virology 258: 327-332). 
However, expression levels, even after transient transfection, were always low. In the 
absence of Rev, multiple inhibitory or instability sequences (INS), also referred to as 
25 cis-acting repressor elements (CRS), in the mRNA interfere with protein expression. 
Potential mechanisms include: nuclear retention or mRNA instability. It was observed 
that mRNA containing CRS is trapped in the nuclei and that the inhibition of 
expression is at least partly due to the poor translocation of mRNA to the cytosol 
(Mikaelian et al. (1996), J. Mol. Biol. 257: 246-264; Borg et al. (1997), Virology 236: 
30 95-103). Elements of the RNA processing machinery could be involved in nuclear 
trapping of mRNA that contains CRS. There is also evidence that several regions of the 
HIV-1 genome that contribute to the instability of the mRNA, have high AU contents. 
They may represent binding sites for cellular factors which contribute to mRNA 
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instability (Schneider et al. (1997), J. Virol. 71: 4892-4903). According to another 
hypothesis, mRNA containing inhibitory sequences fails to be translated efficiently 
without Rev. Whatever the mechanism of the observed inhibition, it is clear that 
inhibition occurs at the level of the mRNA and is due to some AU-rich regions. During 

5 the HIV replication cycle Rev interaction with the Rev responsive element (RRE) 
relieves the inhibition in a regulated manner (Schwartz et al. (1992), J. Virol. 66: 150- 
159). In this perspective, it is not surprising that by mutating some INS while 
preserving the coding function for gcig-pol transcripts, efficient Rev-independent 
expression of viral panicles has been obtained (Schneider et al. (1997), J. Virol. 71: 

10 4892-4903). There is evidence that in the case of HIV gpl20 mRNA poor 
translatability due to inefficient codon usage rather than mRNA instability is 
responsible for low level protein expression (Haas et al., 1996; Schneider et al. (1997), 
J. Virol. 71:4892-4903). 

US 5,811,270 (Grandgenett) describes a test tube method of analysis of 

15 concerted integration in which a viral integrase enzyme is first incubated with donor 
DNA molecules followed by incubation with target DNA molecules. The donor DNA 
has at least one unique restriction site for analysis of the concerted integration product. 
The described method is said to be useful for studying integrase such as screening of 
HIV-1 or HIV-2 integrase inhibitors as well as production of transgenic non-human 

20 animals and gene transfer. The integrase used is purified from virus particles and the 
activity is analyzed in the test tube, not intracellularly. 

US 5,795,737, WO 96/09378, WO 97/11086 and WO 98/12207 all describe 
methods of producing a synthetic gene encoding a protein normally expressed in a 
mammalian cell whereby the synthetic gene is reported to overexpress the encoded 

25 proteins in mammalian cells. The known synthetic genes are constructed by replacing 
non-preferred codons or less preferred codons with preferred codons which encode the 
same amino acid by utilising the redundancy of the genetic code. Examples are given of 
synthetic env genes which encode envelope glycoproteins but there is no discussion of 
expressing a protein with enzymatic activity in the host cell. A method of designing a 

30 synthetic gene for the overexpression of a protein while maintaining its enzymatic 
activity is not derivable from the known teaching. There are a significant number of 
factors which may allow expression of a (retroviral) protein which fails to show 
intracellular enzymatic activity. The expressed enzyme may be defective for many 
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reasons of which intracellular inhibition of the enzyme and the need for the presence of 
another viral protein at the same time are but a few. Further, it is not obvious that an 
enzyme can be overexpressed, for example there may be some limiting factor such as 
poor solubility or cellular toxicity. On the one hand high level expression of a retroviral 

5 enzyme will be required to detect the enzymatic activity, on the other hand levels which 
are too high may cause protein precipitation or cellular toxicity. For any retroviral 
enzyme to be active in the cell an optimal intracellular concentration will be required. 
In case of failure the suggestion is to replace non-preferred codons to a certain 
percentage, e.g. 90%, 80%, 70% but there is no precise teaching of how to select 

10 which codons are to be replaced. In particular, there is no indication that a specific 
nucleotide pair frequency is of relevance to high level gene expression. It is not 
conclusive that the mechanism of RRE-instability (in env) is the same as, or even 
related to the mRNA instability problem in gag and pol. In fact there is evidence that 
the mechanisms are different. Hence, it is not predictable that Rev-independent 

15 expression of an env gene may be extrapolated to cure the instability problem of gag 
and pol genes. 

It is an object of the present invention to develop an efficient expression system 
for an enzymatically active retroviral protein, in particular HIV-1 integrase, in 
eukaryotic cells, especially mammalian cells. 
20 It s a further object to provide a more efficient detection method for retroviral 

enzyme inhibitors. 

It is a further object of the present invention to provide a design method for the 
construction of a gene encoding a retroviral protein with enzymatic activity. 

A further object of the present invention is to provide an expression vector 
25 capable of delivering a gene to a target cell, in which cell the enzymatically active 
protein encoded by the gene is expressed. 

Summary of the invention 

The present invention features a synthetic gene or region of a gene which has an 
30 amended codon usage compared with the wild-type gene and which is for the high level 
expression of a retroviral protein in eukaryotic cells, the expressed retroviral protein 
having enzymatic activity in the eukaryotic cell. In addition, the invention features a 
synthetic gene or region of a gene encoding a retroviral enzyme or part of a retroviral 
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enzyme normally expressed in a mammalian or other eukaryotic cell wherein at least 
one non-preferred codon in the wild-type gene encoding the enzyme has been replaced 
by a preferred codon encoding the same amino acid. By "region of a gene with 
amended codon usage" is meant that it can be sufficient to change codons only in those 
5 parts of a gene that normally produce instability sequences (INS) or cis-acting repressor 
elements (CRS) in the transcribed mRNA of the gene. 

By "retroviral protein or enzyme normally expressed in a mammalian or 
eukaryotic cell" is meant a protein or enzyme which is expressed in a mammalian or 
eukaryotic cell under disease conditions. These are genes which are encoded by a 

10 retrovirus (including a lentivirus) which are expressed in mammalian or eukaryotic 
cells post-infection. 

In preferred embodiments, the synthetic gene is capable of expressing the 
retroviral enzyme at a level at least 200% of that expressed by the "natural" (or 
"native") gene in a mammalian or eukaryotic cell culture system. 

15 The retroviral protein may be a protease, reverse transcriptase, integrase protein 

or a polyprotein gag-pol precursor thereof. In one embodiment the retroviral protein 
with enzymatic activity is a lentiviral protein. In other embodiments the enzymatically 
active protein is a pol enzyme. In more preferred embodiments, the enzymatically 
active protein is a lentiviral integrase. In an even more preferred embodiment the 

20 enzyme is an HIV enzyme. In more preferred embodiments the enzymatically active 
protein is HIV integrase. The enzymatic activity includes at least an integrase function, 
namely of promotion or stimulation of the integration of DNA fragments into host cell 
DNA, preferably the chromosome of the host cell. The integrase hereby is expressed on 
its own id est as a single component, independent of any retroviral components.. 

25 By "retroviral components" is meant the retroviral, specifically the lentiviral, 

and more specificially the HIV-1 regulatory and accessory proteins like Tat, Rev, Nef, 
Vpu, Vif, Vpr. 

The invention also features a eukaryotic expression vector comprising the 
synthetic gene or region of a gene. The expression vector preferably includes a 
30 constitutive or an inducible or a tissue-specific promoter. Expression from the 
eukaryotic expression vector can be transient after transfection of the vector in a 
eukaryotic cell by any of suitable, e.g. established, transfection procedures. The vector 
may be any suitable vector such as a plasmid, a mammalian or insect virus. Expression 



8NSDOCID: <WO 0065076A2„I_> 



WO 00/65076 PCT/EP00/03765 

6 

may also be permanent in a eukaryotic cell line stably harbouring the expression vector. 
The expression vector may be comprised in a packaging construct for producing 
retroviral particles for gene transfer. The retroviral particle may be a lentiviral particle. 
Another aspect of the present invention features a eukaryotic cell line that 
5 harbours the synthetic gene or region of a gene. The cell line preferably expresses the 
retroviral enzymatically active protein using a constitutive, inducible or tissue specific 
promoter. The expressed retroviral protein shows enzymatic activity that can be 
measured for example by complementation of enzyme-defective viruses or in the case 
of an integrase by stimulation or the promotion of the insertion of DNA molecules into 
10 another DNA molecule, preferably the chromosome of the cell. 

The present invention also includes a transgenic non-human animal harboring 
the synthetic gene or region of a gene. The expression of the gene or region of a gene 
may be induced at any moment using an inducible promoter or, alternatively, in desired 
tissues using a tissue-specific promoter. 
15 The present invention also features a method for preparing a synthetic gene or 

region of a gene encoding an enzymatically active retroviral protein or part of such a 
protein. The method not only identifies and uses preferred codon usage but also, and 
moreover mainly, seeks to increase mRNA stability during expression. The method 
includes identifying a small group of genes from the total set of genes of a target 
20 eukaryotic cell which encode proteins which are naturally expressed easily and/or in 
high concentrations in the target cell. The small group may include 10 or less genes, 
more typically 5 or less genes. From the codon sequences of these identified genes, a 
preferred codon usage and a preferred nucleotide relationship or nucleotide pair 
frequency is identified. By preferred codon usage is meant that for a specific amino 
25 acid a specific codon is chosen as the preferred codon to encode the amino acid based 
on the high use of the preferred codon within the select group of genes. By a preferred 
codon relationship is meant the ratios of the various nucleotides and combinations of 
nucleotides to each other which commonly appear in genes of the target eukaryotic cell. 
One particular nucleotide relationship is the GC content or the GC nucleotide pair 
30 frequency. Using the preferred codon usage, non-preferred codons are identified in the 
natural gene encoding the enzyme and one or more of the non-preferred codons is/are 
replaced with a preferred codon encoding the same amino acid as the replaced codon. 
The replacement is biased to obtain the preferred nucleotide relationship or nucleotide 
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pair frequency, resulting in even better optimized conditions for expression in 
eukaryotes compared to the use of preferred codon usage only. The replacement may 
be made based on a random choice between alternative codons encoding the same 
amino acid at each position using a random number generator and biasing the choice of 
5 alternative codons based on the preferred codon usage to obtain the preferred 

nucleotide relationship or nucleotide pair frequency. In addition, the synthetic gene 
sequence may be edited by removing potential splice sites and to reduce the number of 
CpG methylation sites while keeping the overall nucleotide relationship or the 
nucleotide pair frequency close to the preferred one, e.g. keeping the GC content and 

10 codon usage close to the preferred one. GC content should be kept close to the 

preferred usage in the target cell, e.g. about 60% in mammalian cells. A preferred range 
for the GC content is 53 to 63%, more preferably 55 to 61% for expression of the gene 
in human cells. To provide efficient initiation of translation the Kozak consensus 
sequence (ANNATGG) may be added. 

15 It is not necessary to replace all non-preferred codons with preferred codons. 

Increased expression may be accomplished even with partial non-preferred codon 
replacement with preferred codons. Under some circumstances it may be desirable to 
only partially replace non-preferred codons with preferred codons in order to obtain an 
intermediate level of expression. 

20 By "synthetic gene" is meant a nucleotide sequence encoding a naturally 

occurring protein in which a portion of the naturally occurring codons has been 
replaced by other codons. For example, a non-preferred codon is replaced with a 
preferred codon encoding the same amino acid. However, by replacing codons to create 
a synthetic gene the expression in eukaryotic, e.g. mammalian cells (especially human 

25 cells) of a wide variety of genes (of eukaryotic, mammalian, prokaryotic or viral origin) 
can be increased compared to the expression of the naturally occurring gene. Thus, the 
invention includes improving the eukaryotic, especially a mammalian cell expression of 
a gene from any source by the codon replacement methods described herein. 

By "vector" is meant a DNA molecule, derived, e.g., from a plasmid, or 

30 mammalian or insect virus, into which fragments of DNA may be inserted or cloned. A 
vector will contain one or more unique restriction sites and may be capable of 
autonomous replication in a defined host or vehicle organism such that the cloned 
sequence is reproducible. Thus, by "expression vector" is meant any autonomous 
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element capable of directing the synthesis of a protein. Such DNA expression vectors 
include mammalian plasmids and viruses. 

By retroviral "packaging construct" or "packaging vector" is meant a plasmid- 
based or virus-based vector or construct, configured to encode for the proteins 
5 necessary for producing vims particles that are devoid of genomic RNA. In general, 
this implies providing the gag, pol and env gene products. Lentiviral packaging 
constructs of interest contain changes to the coding sequences of gag or pol proteins 
(i.e. synthetic genes) to enhance lentiviral protein expression and to enhance safety. For 
biosafety reasons, the packaging functions are often divided into two genomes, one 
10 which expresses the gag and pol gene and another expressing the env gene product. In 
packaging constructs in accordance with the present invention, regulators of gene 
expression such as the Rev gene product would no longer be required. Increased 
biosafety of these packaging constructs is based on a reduced risk for (homologous) 
recombination of these synthetic genes with their wild-type counterparts. 
15 The invention also features synthetic portion of a gene which encodes a desired 

portion of the protein. Such synthetic gene fragments are similar to the synthetic genes 
of the invention except that they encode only a portion of the protein. The portion of 
the gene encodes a portion of the enzyme which has some enzymatic activity, e.g. it 
may have catalytic activity, for example, the synthetic gene may encode a catalytic core 
20 of an enzyme, e.g. it may be a part of reverse transcriptase. 

The present invention also includes a detection method for intracellular 
integrase using a promoterless reporter gene. The reporter gene may be luciferase, GFP 
or an antibiotic selection marker (e.g. neomycin resistance). The reporter gene 
construct may be used as the substrate of the retroviral enzyme, e.g. integrase expressed 
25 from the synthetic gene be it in a stable cell line or in a transient mode after transfection 
of the expression vector, the retroviral enzyme, e.g. integrase being in accordance with 
the present invention. 

The present invention may provide a synthetic gene and a method of designing 
and constructing the same to obtain efficient expression of a retroviral, in particular 
30 lentiviral enzyme such as integrase of the human immunodeficiency virus type 1 (HIV- 
1). or part of a retroviral enzyme in mammalian cells. The synthetic gene circumvents 
mRNA instability by increasing the GC content of the wild type integrase gene from 
40% to 59%. The synthetic gene, cloned in a eukaryotic expression vector, provides 
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efficient expression of HIV- 1 integrase in various mammalian cell lines. The amino 
terminus of the protein was as predicted by the sequence after removal of the first 
methionyl residue. Nuclear localization of the recombinant protein was evidenced by 
fluorescence microscopy. A 293T cell line stably expressing HIV-1 integrase was 

5 obtained. The functionality of integrase was proven by trans-complementation 
experiments. Lentiviral vector particles carrying the inactivating D64V mutation in the 
integrase gene, were obtained capable of stably transducing 293T cells when 
complemented in the producer cell line with integrase expressed from the synthetic 
gene. When the cell line that stably expresses integrase was infected with the defective 

10 virus panicles, complementation of integrase function was observed. Transfection with 
a linear promoterless DNA substrate that contains a reporter gene behind an IRES and 
is flanked by HIV LTR ends, resulted in a reproducibly higher reporter signal in cells 
that express integrase. Since the increase in reporter gene activity was stable upon 
passaging of the transfected cells, it can be concluded that the integrase promotes 

15 insertion of the linear DNA substrate in the cellular chromosome. The fold increase of 
reporter signal with integrase expressed from a mutant synthetic gene, containing the 
D64V mutation, was considerably lower, indicating that the enzymatic activity of the 
enzyme was required. The established cellular integration system in accordance with 
the present invention facilitates the study of the interplay between host and viral factors 

20 during integration, the development of specific HIV integration inhibitors as well as the 
design of gene transfer systems. 

The present invention, its advantages and embodiments will now be described 
with reference to the following figures and drawings. 



25 Brief description of the drawings 

Figure 1. Western blot analysis of transient expression of HIV-1 IN in 293T 
cells using different expression strategies. 293T cells were transiently transfected 
with the various expression vectors. At 48 hrs post transfection cell extracts were made 
using 1% SDS, 1 mM PMSF. Cell extracts representing 10 |ig of total protein were 

30 separated by PAGE and blotted onto PVDF membranes. Detection was performed 
using polyclonal antibodies against HIV-1 integrase and the ECL+ detection system. 
Lane 1 contains 2.5 ng of recombinant and purified His-tagged HIV-1 integrase (HT- 
IN). The other lanes contain extracts after transfections with equal amounts of the 
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following plasmids: Lane 2, pCEP4; Lane 3, pCEP-IN; Lane 4, pCEP-IN-CTE; Lane 5, 
pCEP-IN-RRE + pEF-cREV; Lane 6 pCMV-IN 5 . 

Figure 2. Sequence and structure of the synthetic gene. 

(A) Sequence of the synthetic DNA coding for pNL4-3 HIV-1 integrase. The amino 
5 acid sequence is shown in the single letter code. The restriction sites used in 

construction are boxed. The translation initiation site is underlined. 

(B) A schematic representation of the structure of the synthetic gene. The following 
regions are indicated : the 5'- and 3'- untranslated regions (UTR) derived from (3- 
globin mRNA, the Met-Gly dipeptide and the integrase open reading frame (ORF). The 

10 three domains of the integrase protein are shown: the Zinc finger motif (HHCC), the 
catalytic core and the DNA binding domain. 

Figure 3. Western blot analysis of the 293T-derived ceil line that stably 
expresses HIV-1 IN from the synthetic gene. 293T cells were transfected with 
pCMV-IN s and a stable cell line was selected with HygromycinB. Cell extracts (10 jig 

15 of total protein) were separated by PAGE and blotted onto PVDF membrane. Detection 
was performed using polyclonal antibodies against HIV-1 integrase and the ECL+ 
detection system. Lane 1, 2.5 ng recombinant His-tagged HIV-1 integrase; Lane 2, 
extract of 293 T cells; Lane 3, extract of 293T cells stably expressing IN (293T-IN S ). 

Figure 4. Detection of integrase activity using a promoterless 

20 reporter construct (DIPR) Figs. 4A-C are schematic representations of the method of 
detection of integrase activity using a promoterless reporter gene. 

Description of the illustrated embodiments 

The present invention will mainly be described with reference to a synthetic 
25 gene for overexpressing HIV integrase in mammalian cells but the invention is not 
limited thereto but only by the claims. 

It has long been known that expression of eukaryotic genes in prokaryotes can 
be optimised by designing synthetic genes with modified codon usage. Less 
established, although demonstrated, is the concept of increasing expression of 
30 eukaryotic genes in eukaryotic cells by modified codon usage. From bacteria it is 
known that few general rules apply (Makrides F. (1996), Microb. Rev 60: 512-538). 

A retroviral enzyme such as integrase does not normally, during the infectious 
cycle, work as a soluble protein in the cytoplasm of a host cell. 
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Integrase is part of a large ill-defined nucleoprotein complex called the preintegration 
complex of which also reverse transcriptase, nucleocapsid, matrix protein, the viral 
DNA and other factors are part. It is not obvious that integrase on its own in the 
cytoplasm of a target cell is enzymaticaliy active, for example, there may be cellular 

5 factors which inhibit activity or viral factors which are missing in this environment. 
Further, it is not obvious that integrase expressed as such v/ill interact with artificial 
DNA substrates (see DIPR below). One aspect of the present invention is dissecting the 
preintegration complex to obtain a simple integrase-linear DNA interaction. One 
embodiment of the present invention is a method to detect and utilize the enzymatic 

10 activity of a retroviral, in particular a lentiviral enzyme, in particular integrase by itself 
in a eukaryotic cell. 

Initially eukaryotic expression vectors encoding HIV-1 IN and IN-RRE were 
constructed based on the reasoning that co-expression of Rev in cells transfected with 
IN-RRE would increase expression levels of IN. However, in human cells transiently 

15 transfected with these expression vectors, little or no expression of TN was detected by 
either immunofluorescence microscopy or western blotting (Fig. 1). An alternative 
approach consisted of introducing the constitutive transport element (CTE) of simian 
retrovirus type 1 behind the integrase gene. Again, expression was barely detectable 
upon prolonged exposure of the blot and amounted to merely 40 ng per 10 x 10 6 

20 transfected cells. The construction of a C-terminal fusion to green fluorescent protein 
(GFP) (GFP-IN) resulted in a more pronounced expression of wild-type HIV-1 
integrase expressed in mammalian cells (Pluymers et al. (1999), Virology 258: 327- 
332). Rev co-expression was not required, in accord with Kukolj et al. (1997), J. Virol. 
71: 843-847), who expressed integrase as a C-terminal fusion protein with (3- 

25 galactosidase in the absence of Rev. The impact of the INS in the IN gene on protein 
expression levels was illustrated by a 5-fold decrease in expression levels of the GFP- 
IN construct compared to the parental GFP (Pluymers et al. (1999), Virology 258: 327- 
332). The present invention is based on a synthetic gene for HIV-1 integrase with an 
increased intrinsic mRNA stability. The use of such a synthetic gene resulted in high 

30 expression levels and concurrent enzymatic integrase activity as could be demonstrated 
via complementation tests. 

In accordance with the present invention, an integrase gene was synthesised 
with an increased GC content resulting in high level expression of HIV-1 IN in various 
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mammalian cell lines. The enzyme was shown to complement defective integrase 
carried by HI V- 1 -derived vector particles and to act in trans on linear DNA substrates 
that are flanked by LTR fragments and encode a reporter gene. 

5 Design and construction of the synthetic integrase gene 

Synthetic genes have been constructed in the past to optimize expression of 
eukaryotic genes in bacteria based on the knowledge that codon usage in prokaryotes is 
quite different from that in eukaryotes. HIV (lentiviral) genes are not optimal for high 
level expression in eukaryotic cells. This is related to the mechanism HIV uses to 
10 circumvent the mRNA instability, namely Rev. During the replication cycle early 
mRNA transcripts will be spliced which results in expression of regulatory proteins 
such as Tat and Rev. Only late in the cycle, does Rev accumulation and Rev-RRE 
interaction block splicing and suppress AT-rich instability sequences resulting in 
unspliced transcripts encoding structural and enzymatic proteins. Whereas the synthetic 
15 gene in accordance with the present invention clearly augments protein expression in 
mammalian cells, which is a prerequisite to detect the functionality of the enzyme in 
the cell, in the context of replicating HIV the presence of a gene with an increased GC 
content may well interfere with the mechanism of regulation of gene expression and be 
detrimental for viral replication. 
20 In accordance with an embodiment of the present invention a synthetic viral 

gene was designed for better optimized and more efficient expression in mammalian 
cells. The HIV-1 integrase gene has a GC content of 40% whereas highly expressed 
human genes on average have a GC content of 55-61%. Hence, the GC content is one 
aspect of the preferred nucleotide relationship or nucleotide pair frequency in 
25 accordance with the present invention. By employing the degenerative nature of the 
genetic code and selecting for the preferred codon usage in the synthetic gene, the GC 
code content of a synthetic gene encoding HIV integrase would be increased up to 66% 
without altering the amino acid sequence. However, this is not preferred in accordance 
with the present invention. First of all, in accordance with the present invention, the 
30 choice among the alternative codons was biased in favour of preferred triplets (codons) 
found in a small group of genes of the total human genome which express 
well/strongly, e.g. human (3-globin, a-, y-actin and EF2 genes (method of determining 
the preferred codon usage). In addition the bias was such as to approximate the 
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preferred nucleotide relationship or nucleotide pair frequency, i.e. within the range 53 
to 63%, more preferably 55-61% for the GC content. In fact a GC content of 59% 
rather than 66% was achieved. The other rules for redesigning retroviral genes for 
eukaryotic expression are: (i) removal of potential splice sites, (ii) reduction of the 

5 number of CpG methylation sites, (iii) introduction of 5' and 3' -untranslated regions 
(UTR) of a mammalian mRNA (in our case from human P-globin), (iv) addition of an 
extra N-terminal peptide (Met-Gly for the examples given below) for efficient initiation 
of translation. As a result expression levels from the synthetic gene in various 
mammalian cell lines were at least 25-fold higher than from the natural integrase gene. 

10 Efficient expression was also obtained in yeast {Pichia pastoris) (data not shown). 

In accordance with one embodiment of the present invention a gene is provided 
to achieve high level expression of HIV- 1 integrase in human cell lines by maintaining 
the amino acid sequence of IN from the pNL4-3 clone of HIV-1 while adapting the 
nucleotide codon usage to the codon usage of constitutively and highly expressed 

15 human genes ("preferred codon usage"). A first version of an artificial IN reading frame 
was based on random choice between alternative codons at each position using a 
random number generator, biasing in favour of preferred triplets as found in the human 
P-globin, a-, y-actin and EF2 genes. Next, the DNA sequence was substantially edited 
to remove potential splice sites and to reduce the number of CpG methylation sites, but 

20 keeping the overall GC content and codon usage close to optimal ("preferred nucleotide 
relationship" or "nucleotide pair frequency"). The final version of the synthetic gene 
(Fig. 2 or SEQ ID NO:l) contains fragments of the 5'- and 3 '-untranslated regions from 
the p-globin mRNA. This gene encodes for wild type HIV-1 integrase with addition of 
the N-terminal Met-Gly dipeptide. The extra glycine codon completes the Kozak's 

25 consensus sequence (ANNATGG) required for efficient initiation of translation. In the 
synthetic gene the overall GC content is 59% compared to 40% in the wild type. The 
gene was constructed from six synthetic DNA fragments, each approximately 150 bp 
long, by stepwise cloning. It should be understood that various homologs of the gene 
shown in Fig. 2 or SEQ ID NO:l are included within the scope of the present invention. 

30 Reapplication of the random number biasing procedure in accordance with the present 
invention would generate alternative sequences all of them coding for the same protein 
and all having a similar preferred nucleotide relationship or nucleotide pair frequency. 
All such synthetic gene homologs are included within the scope of the present 
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invention. 

The synthetic gene includes modification to those described above, the 
following modifications and improvements of the synthetic gene are included within 
the scope of the present invention. For example, the leader peptide can be replaced 
5 affecting the efficiency of translation and potential myristoylation (e.g. for example, a 
Met-Ala variant has been constructed). The 5' and 3'-UTRs may be replaced by UTRs 
from other mammalian mRNAs to optimize the stability of the transcript. Mutations in 
the open reading frame are also included within the scope of the present invention 
whereby the canonical integrase sequences (e.g. HHCC and DD35E) are preferably left 
10 unchanged. A more soluble version can be made by introducing for example the 
F185K/ F185H mutations. Other mutations may induce increased or altered catalytic 
activity of the enzyme in the eukaryotic cell. For example, the present invention 
includes a variant synthetic gene with the D64V mutation, known to reduce drastically 
the enzymatic activity of integrase. Synthetic genes of integrase are included within the 
15 scope of the present invention in which the genetic information of domains of other 
proteins are added. These domains preferably add additional properties to the enzyme 
such as sequence specificity in DNA binding. Examples of methods of providing 
specificity to a gene encoding integrase are described in WO 96/37626, US 5,81 1,270 
without describing the specific innovative aspects of the present invention. 
20 The synthetic gene for HIV-1 integrase was designed to circumvent inhibition 

of gene expression induced by instability sequences (INS) in the wild type integrase 
gene. This approach can be applied to retroviral integrases in general. In particular the 
aforementioned design method may be used to redesign any retroviral viral gene 
encoding a protein with enzymatic activity for efficient expression in eukaryotes. In 
25 particular, the design method of synthetic genes in accordance with the present 
invention will boost eukaryotic expression for retroviral genes encoding a protein with 
enzymatic activity, especially lentiviral integrases and pol proteins in general. The 
particular approach of the present invention could also be applied to redesign gag 
genes, in which mRNA instability due to the presence of INS elements and not poor 
30 translatabiliry, like for env, would be the problem. Although the role of Rev in 
suppressing the effect of FNS is only well studied in the case of HIV-1, all other 
lentiviruses are known to encode proteins analogous to Rev. Likewise the human T- 
lymphotropic and bovine lymphotropic viruses (HTLVs and BLV) encode Rev. Simple 
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retroviruses such as Mason-Pfizer monkey virus and simian retrovirus- 1 (SRV-1) 
contain a constitutive transport element (CTE) that promotes nuclear export of 
unspliced mRNA. It has been shown that CTE can functionally substitute for Rev 
interacting with RRE. In fact, a low level transient expression from a wild type 

5 integrase gene with a downstream CTE of SRV-1 has been obtained by us using the 
methods of the present invention. Since the design of a synthetic gene in accordance 
with the present invention abolishes any need for co-expression of Rev and presence of 
RRE or CTE in the construct, this approach can improve expression of retroviral 
enzymes in general and intcgrascs in particular. 

10 In creating mammalian expression vectors, various eukaryotic expression 

plasmids can be used. Expression can be under control of a constitutive promoter (for 
example hCMV and RSV) or an inducible promoter. Examples of (commercially 
available) inducible expression systems are the ecdysone-inducible and the tetracyclin- 
inducible (Tet-Off and Tet-On) expression systems. Tissue-specific promoters that 

15 limit expression in specific tissues may also be envisaged. Examples are the established 
neuron-specific promoters Thy-1 and enolase. Inducible promoters may limit cellular 
toxicity, although a cell line that stably expresses integrase was obtained. In transgenic 
non-human animals harbouring the synthetic gene, expression may be induced at a 
desired moment using an inducible promoter or in desired tissues using a tissue-specific 

20 promoter. 



Transient and stable expression of HIV-] integrase in 293T and HeLa cells 

The synthetic gene for integrase (IN S ) was cloned into the expression vectors 
pCEP4 and pBK-RSV under control of the human cytomegalovirus (hCMV) and Rous 

25 sarcoma virus (RSV) promoters, respectively. Transient and stable expression of IN 
was obtained in both 293T and HeLa cell lines, as verified by irnmunoblotting (Fig. 1, 
3) and indirect immunofluorescence (data not shown). In transfected 293T cells the 
expression levels from the hCMV promoter amounted to 10-20 ^g of IN per 10 x 10 6 
cells which is at least 25-fold higher than obtained with expression vectors that contain 

30 the unfused wild type HIV-1 integrase gene. 

Transfection of 293T cells with the episomal expression vector pCEP-IN s 
followed by selection with hygromycinJB, resulted in a stable cell line, referred to as 
293T-IN S . Indirect immunofluorescence staining revealed that 80-90% of selected cells 
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produce integrase at detectable levels. The expression level, as estimated by 
quantitative immunoblotting, was about 0.5 jig of integrase per 10 x 10 6 cells. The 
reduced cell growth kinetics of 293T-IN S (30-50% as compared to the parental 293T 
cell line) is suggestive of cellular toxicity of integrase in mammalian cells. 

5 In HeLa cells integrase was found exclusively in the nuclei. In 293T cells transient 
transfections typically gave rise to an irregular, granular cytoplasmatic distribution of 
IN, probably due to precipitation of the protein. In the 293T cell line selected to stably 
express IN, nuclear localization of IN was evident. During the metaphase and anaphase 
steps of mitosis, IN remained stably associated with the chromosomes. 

10 Solid phase N-terminal sequencing of integrase purified from transiently 

transfected 293T cells, revealed the following amino terminus: Gly-Phe-Leu-Asp-Gly- 
Ile-Asp-Lys. This is the sequence predicted by the synthetic gene, the starting 
methionine being removed post-translationally. 

1 5 Functionality of 1N S 

Complementation of IN-defective vector particles 

To verify whether the integrase expressed from the synthetic gene in 
mammalian cells is enzymatically active, the ability of IN to complement integrase- 

20 defective HIV-derived lentiviral vectors was tested. HIV- 1 -derived lenti viral vectors 
have been developed by Naldini et al. and Zufferey et al. (Naldini et al. (1996), Science 
272: 263-267; Zufferey et al. (1997), Nature Biotechnol. 15: 871-875). Pseudotyped 
lentiviral vector particles are produced by transfecting 293T cells with a packaging 
plasmid encoding viral gag and pol proteins, a plasmid encoding the envelope of 

25 vesicular stomatitis virus and a plasmid encoding a reporter gene flanked by two long 
terminal repeats (LTRs). The first generation packaging plasmid pCMVAR8.2, 
containing all HIV genes except for env and the transfer vector pHR'-CMVLacZ were 
used to produce wild type vector (WT vector). Integrase-defective virus particles 
(D64V vector) were produced using pCMVAR8.2IN(D64V) (Naldini et al. (1996), 

30 Science 272: 263-267). The D64V mutation in the integrase gene is known to abolish 
integrase activity, without affecting any other step of the infection (Leavitt et al. 
(1993), J. Biol. Chem. 268: 2113-2119; Leavitt et al. (1996), J. Virol. 70: 721-728). 
The transducing titer of the D64V vector in 293T cells was 20-fold lower than the titer 
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of WT vector (Table 1). This is in good agreement with previously reported results 
(Naldini et al. (1996), Science 272: 263-267). The observed "background" expression 
after D64V transduction, is mostly due to transcription from non-integrated circularized 
viral DNA since P-galactosidase expression after D64V transduction is reduced 
5 drastically upon passaging the cells (Table 1). Nevertheless, in some of the transduction 
experiments 1 or 2 galactosidase-positive colonies were observed. A residual 
transducing activity of D64V virus was observed before (Gaur and Leavitt (1998), J. 
Virol. 72: 4678-4685). It is possible that this integration is independent of the viral 
integrase. 

10 Complemented vectors (C IN) were produced after quadruple transient 

transfection of producer cells, including pCEP-IN s , the expression vector containing 
the synthetic gene. The transducing activity was restored up to 30% with C IN (Table 
1). Complementation was due to stable integration, since an equal proportion of 
galactosidase-positive colonies was counted after multiple passages of the transduced 

15 cells. The principle of trans-complementation of IN-defective virus was shown 
previously, using VPR-IN fusion expression constructs (Fletcher et al. (1997), EMBO 
J. 16: 5123-5138). The transducing activity of catalytic domain mutants of IN was 
restored up to 20% by transcomplementation with VPR-IN. However, in the absence of 
VPR, the expression construct for wild type integrase, only achieved 0.04% 

20 complementation efficiency (Fletcher et al. (1997), EMBO J. 16: 5123-5138). The 
synthetic gene in accordance with the present invention, in the absence of VPR, results 
in a complementation activity that is 750-fold more pronounced. 

Moreover, evidence for trans-complementing activity of integrase expressed 
from the synthetic gene in target cells was also obtained (Table 1). Transduction of IN- 

25 expressing 293T cells with IN-defective virus particles, resulted in a higher 
transduction efficiency as compared with the parental 293T cells. After passaging the 
transduced cells, the difference became even more pronounced. This points to a 
catalytic interaction of integrase present in the receptor cell with the pre-integration 
complex of the incoming vector. For the wild type and the complemented vectors 

30 increased transduction efficiencies were obtained as well. This may suggest that the 
amount of active integrase present in the viral particle is dose-limiting or that integrase 
present in the target cell neutralizes inhibitory host factors. 
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Detection of integrase activity using a promoterless reporter gene (DIPR). 

Integration of HIV in the chromosome does not show strict sequence- 
specificity, although a weak consensus was found for the integration sites (Carteau et 
al.(1998), J. Virol. 72, 4005-4014). It is commonly accepted although not formally 
5 proven, that retroviral integration is favored in open chromatin near or within active 
transcription units (Rohdewohld ct al. (1987), J. Virol. 61: 336-343; Scherdin et al. 
(1990), J. Virol. 64, 907-912; Vijaya ct al. (1986), J. Virol. 60: 683-692; Carteau et al. 
(1998), J. Virol. 72, 4005-4014). The design of a promoterless reporter substrate for 
measuring integrase activity in cell culture, is based on this finding (Figs. 4A - C). In 
10 accordance with an embodiment of the present invention a method is proposed in which 
read-through transcription of the integrated promoterless reporter gene will occur when 
inserted within an actively transcribed region of the chromosome. The construct 
designed is a linear DNA fragment, flanked by the 200 bp terminal fragments of the 
HIV LTRs that provide the integrase recognition sites. The marker gene may encode 
15 luciferase, for instance. The presence of an IRES (internal ribosome entry site) in front 
of the open reading frame of luciferase, directs cap-independent translation of mRNA 
transcripts (Fig. 4A). 

After transfection with this DIPR substrate (Fig. 4B, C), luciferase activity 
measured in 293T-fN s cells was always 4 to 10 times higher than in the parental 293T 
20 cells (Table 2). In the DIPR assay activity of the D64V mutant integrase was reduced 
compared to the wild type integrase (data not shown). These results point to an activity 
of the intracellularly expressed integrase (expressed by the synthetic gene in 
accordance with the present invention) (Fig. 4C). Sequencing of integrated linear DNA 
molecules in 293T cells transiently expressing integrase from the synthetic gene using 
25 Alu-PCR, revealed the characteristic removal of the 3' GT dinucleotide in 10% of 
integrants. In control cells not expressing integrase none of the DNA insertions showed 
this hallmark. 

Applications 

30 An embodiment of the present invention includes the construction of an 

efficient eukaryotic expression vector for a retroviral enzyme, e.g. HIV-1 integrase, 
based on the creation of a synthetic gene. Expression from the eukaryotic expression 
vector can be transient after transfection of the plasmid in a eukaryotic cell by any of 
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established transfection procedures. Expression may also be permanent in a cell line 
stably harbouring the expression vector. An important aspect of the present invention 
and its applications is the functionality of an expressed retroviral enzymatically active 
protein, as opposed to mere the high level expression of an enzymatically inactive 
5 retroviral protein. 

Intracellular integrase test for the evaluation of integrase inhibitors 

An embodiment of the present invention includes assays for evaluating 
integrase activity in cells transfected with a DNA substrate that is flanked by fragments 
10 of HIV LTR, a so-called mini-HIV. In both assays data point to enzymatic activity of 
IN. 

In DIAS (detection of integrase activity through antibiotic selection) test, a 
resistance gene to a cytotoxic drug is present in the mini-HIV DNA. The presence of 
IN in the transfected cell augments stable insertion of the resistance gene in the 
15 chromosome. Scoring is performed by comparing the residual number of colonies 
resistant to the cytotoxic agent in comparison with cells transfected with heterologous 
DNA. 

In DIPR (detection of integrase activity using a promoterless reporter gene), a 
reporter gene (luciferase) without promoter is present downstream of an internal 

20 ribosome entry site (IRES) in mini-HIV (Fig. 4A). The presence of IN in the 
transfected cell (Fig. 4B) augments stable insertion of the reporter construct in the host 
chromosome in close proximity to a cellular promoter (Fig. 4C). Scoring is performed 
by measuring enzyme activity expressed from the promoterless marker gene, e.g. 
luciferase. The latter assay is highly amenable to evaluation of integrase inhibitors in 

25 cell culture in a microtiter plate format, adaptable for high throughput screening. 
Potential integrase inhibitors would result in the absence of or a significant decrease in 
the level of detectable signal from the promoterless marker gene. 

Such an assay in accordance with the present invention involves screening test 
inhibitory compounds from large libraries of synthetic or natural compounds. Synthetic 

30 compound libraries are commercially available from, for example, Maybridge 
Chemical Co. (Trevillet, Cornwall, UK), Comgenex (Princeton, NJ), Brandon 
Associates (Merrimack, NH) and Microsource (New Milford, CT). A rare chemical 
library is available from Aldrich Chemical Company, Inc. (Milwaukee, WI). 
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Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and 
animal extracts are available from, for example, New Chemical Entities, Pan 
Laboratories, Bothell, WA or MycoSearch (NC), Chiron, or are readily producible. 
Plant extracts may also be obtained form the University of Ghent, Belgium. 
5 Additionally, natural and synthetically produced libraries and compounds are readily 
modified through conventional chemical, physical, and biochemical means. 



Tool for non-viral cellular gene delivery. 

Cell lines that express integrase from a synthetic gene do have greater 
10 propensity to integrate foreign DNA, flanked by LTR fragments. These cell lines are 
thus more transducible. An embodiment of this invention is the creation of eukaryotic 
cell lines (or cell culture systems) that are highly transducible (at least 200% compared 
to the parent cell). Embodiments of the present invention also include applications in 
transgene technology to increase the efficiency of (non)homologous recombination in 
15 ES cells be it by transient expression from a plasmid or after induced expression of the 
retroviral integrase in ES cells transgenic for the synthetic gene. The synthetic gene in 
accordance with the present invention may be brought into cells by any transfection 
agent or method (e.g. electroporation or lipofection) and may result in the stable 
integration of DNA in the chromosome. 

20 

Retroviral (lentiviral) vector packaging construct 

From the complementation experiment it is clear that integrase expressed from 
the synthetic gene in the producer cell can complement integrase-defective lentiviral 
virus particles encoded by a packaging plasmid and thus can substitute for the protein 

25 expressed by the packaging construct. It follows that in an expression vector based on 
one or more synthetic gene(s) for a lentiviral gag-pol gene, the synthetic gene(s) can 
substitute for the natural gene(s) in the packaging constructs resulting in Rev- 
independent high level protein expression. The present invention includes a packaging 
construct based on non-lentiviral complex retroviruses in which protein expression is 

30 dependent on a Rev homologue such as Rex in the case of HTLV-1. Lentiviral vectors 
per se ; capable of transducing a non-dividing cell, are known in the art (see Naldini et 
al. (1996), Science 272: 263-267, Zufferey et al. (1997), Nature Biotechnol. 15: 871- 
875). Generally the vectors are plasmid-based or virus-based, and are configured to 



BNSDOCID: <WO O065076A2_L> 




WO 00/65076 PCT/EPOO/03765 

21 

carry the essential sequences for incorporating nucleic acid, for selection and for 
transfer of the nucleic acid in the host cell. Gag, pol and env genes of interest are 
known in the art. Briefly, a first vector can provide a nucleic acid encoding a viral gag 
and a viral pol, and a second vector can provide a nucleic acid encoding a viral env 

5 gene product to produce a packaging cell. Packaging cells or cell lines supply in trans 
the proteins necessary for producing infectious virions, themselves being incapable of 
packaging endogenous viral genomic nucleic acids (Watanabe & Temin (1983), Molec. 
Cell Biol. 3(12): 2241-2249; Mann et al. (1983), Cell 33:153-159; Embretson & Temin 
(1987), J. Virol. 61(9): 2675-2683). Introducing a vector providing a heterologous 

10 gene, the transfer vector, into such packaging cells yields producer cells which release 
infectious particles carrying the foreign gene of interest. Methods for transfection or 
infection are well known by those skilled in the art. After cotransfection of the 
packaging vectors and the transfer vector to the packaging cell or cell line, the 
recombinant vector is recovered from the culture media and titered by standard 

15 methods used by those of skill in the art. 

The foreign or heterologous gene carried by the transfer vector can be any 
nucleic acid of interest which can be transcribed, but preferably is a nucleic acid 
encoding for a polypeptide of therapeutic benefit or of interest for gene therapy. The 
env gene in the (second) packaging vector can be derived from any virus, including 

20 retroviruses, and is preferably amphotropic, allowing transduction of cells of human 
and other species, and is preferably under control of non-endogenous regulatory 
sequences. Vectors can be made target-specific through linkage of the env protein with 
an antibody or a ligand for a particular receptor of a particular cell-type (cell-targeting). 
Design of the gag-pol synthetic gene is based on a method to circumvent mRNA 

25 instability associated with these wild-type genes. Preferentially, the method used by us 
to create an expression construct for high level and Rev independent eukaryotic 
expression of active H1V-1 integrase is employed. Further construction of the vectors 
of the present invention, whereby natural gag-pol genes are replaced respectively by 
the synthetic genes of the present invention, employ standard ligation and restriction 

30 techniques which are well understood in the art (see Maniatis et al, in Molecular 
Cloning: a Laboratory Manual, Cold Spring Harbor Laboratory, N.Y., 1982). 

Biosafety requires measurements to reduce the risk of generating recombinant 
replication competent retroviruses (RCR) as much as possible. Dividing the packaging 
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functions into two genomes, one which express the gag and pol gene and another 
expressing the env gene product help to minimize the likelihood of generating RCR. 
That approach minimizes the ability for co-packaging and subsequent transfer of the 
two-genomes, as well as significantly decreases the frequency of recombination due to 

5 the presence of three retroviral genomes in the packaging cell to produce RCR. To 
render any possible recombinations non-functional, mutations (Danos & Mulligan 
(1988), Proc Natl. Acad. Sci 85: 6460-6464) or deletions (Bosselman et al. (1987), 
Molec. Cell Biol. 7(5): 1797-1 806; Markowitz et al. (1988), 62(4): 1 120-1 124) can be 
configured within the undesired gene products. Deletion of the 3'LTR of both 

10 packaging constructs will further reduce the likelihood to form functional 
recombinants. US5,994,136 describes the production of lentiviral vectors with an even 
more remote possibility of generating replication competent lentiviruses by functionally 
deleting the tat gene, which is encoding for a regulating protein that promotes viral 
expression through a transcriptional mechanism. The likelihood of recombination 

15 between the transfer vector, that still contains natural genetic information of the 
lentivirus like part of the gag gene, and synthetic packaging genes will be considerably 
reduced, further improving the biosafety of the lentiviral vectors. DNA sequence 
mismatching, as induced by the replacement of nucleotides in the third position 
compared to the natural gene, seems to present a considerable barrier to homologous 

20 recombination in a wide variety of species. It is therefore also unlikely that 
contaminating or endogenous HIV virus particles would exchange the natural integrase 
gene for the synthetic one through recombination. 

Experimental Procedures 

25 

DNA constructs 

Construction of integrase expression plasmids 

The open reading frame of IN from the HIV-1 clone HXB2 was PCR 
amplified using Pfu DNA polymerase (Stratagene, Cambridge, UK) with the primers 
30 S'-CCCCCAAGCTTGCCAGCCATGTTTTTAGATGGAATAGATAAGG and 5'- 
CCCGCTCC^GCTTTCCTTGAAATATACATATGGTG and subcloned in pCEP4 
(Invitrogen, Leek, The Netherlands), resulting in pCEP-IN. The absence of mutations 
was verified by DNA sequencing. The RRE sequence of HIV-1, clone HXB2, was PCR 
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amplified using the primers 5 y -TTCCGCTCGA (7TAGCACCCACC AAGGCAAAGy* G 
and 5'-TCGCGGATCCAAGGCACAGCAGTGGTGCAAATG. The PCR fragment 
was subcloned in the sense orientation downstream of the integrase gene in pCEP-IN to 
produce pCEP-IN-RRE. The CTE sequence (obtained from plasmid pS12; Taberno et 

5 al., 1996, J. Virol. 70: 5998-6011) was cloned in pCEP4 in the correct orientation, 
followed by the insertion of the integrase gene upstream of the CTE. This resulted in 
the plasmid pCEP-IN-CTE. The construction of pGFP-IN is explained in Pluymers et 
al. (1999), (Virology 258: 327-332). The Rev expression plasmid, pEF321-cREV, was 
provided by Sandoz Forschungs Institut, Vienna, Austria. PCR amplification and 

10 plasmid construction employed standard techniques like standard ligation and 
restriction techniques and conditions which are well understood in the art (see Maniatis 
et al, in Molecular Cloning: a Laboratory Manual, Cold Spring Harbor Laboratroy, 
N.Y., 1982). 

1 5 Assembly of the synthetic gene 

The restriction sites Nhel, PstI, BamHI, Nael, Narl (indicated in Fig.2 ) divide 
the sequence of the synthetic gene into 6 fragments each approximately 150 bp long 
that correspond to the sequences 1-149, 144-306, 301-456, 451-623, 618-776, 771-930 
(Fig. 2). Each of the fragments was constructed separately by annealing and extending 

20 two partially complementary oligonucleotides (85-95 nt long, PAGE-purified and 5'- 
phosphorylated, synthesized by Gibco BRL Life Technologies, Merelbeke, Belgium) 
using Sequenase (Amersham-Pharmacia, Buckinghamshire, UK) . Each fragment was 
cloned into the EcoRV site of the vector pBluescript KS(+) (Stratagene, La Jolla, CA). 
The sequence errors found in the resulting clones were repaired using either the 

25 Stratagene Quick Change procedure with Pyrococcus furiosus (Pfu) polymerase (for a 
base substitution in the fragment 451-623) or PCR (for deletions in the terminal regions 
of the fragment 1-149). The full 930 bp sequence was built by stepwise assembly of the 
fragments similar to the method described in W098/ 12207. Choice of the cloning 
vector (pBluescript KS or SK) at each step was dictated by toxicity of the IN coding 

30 DNA. Finally, the two halves of the IN gene (1-451 and 452-930) were ligated together 
and cloned into pBluescript KS(+) resulting in pIN s . 

Construction of mammalian expression vectors for IN S 
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The plasmid pIN s was digested by EcoRJ and treated with T4 DNA polymerase 
followed by restriction with Xhol. The 1 kb fragment carrying the IN S gene was cloned 
between the PvuII and Xhol sites of pCEP4 (Invitrogen, Leek, The Netherlands) 
resulting in pCMV-In s . pCEP4 is an episomal mammalian expression vector containing 

5 the human cytomegalovirus (hCMV) immediate early enhancer/promoter. The Epstein 
Barr virus replication origin (oriP) and nuclear antigen (encoded by the EBNA-1 gene) 
permit extrachromosomal replication in human, primate and canine cells. A 
hygromycin resistance gene is present, permitting selection of stably transduced clones 
by hygromycinB (GIBCO BRL). The same 1 kb fragment was also cloned between 

10 Nhel and Xhol sites of the pBK-RSV expression vector (Stratagene) (the Nhel 
cohesive end of the vector DNA was filled in using T4 DNA polymerase) resulting in 
pRSV-IN s . In this vector expression of IN S gene is driven by the promoter of Rous 
sarcoma virus (RSV). The presence of the neomycin resistance gene allows selection of 
stably transduced clones by geneticin (G418) (GIBCO BRL). 

15 

Construction of the substrate for the DIPR assay 

The DNA substrate for the DIPR assay was obtained by linearization of pLTR- 
IRES-Luc with Seal. This plasmid was constructed in the following way. First, the 350 
bp KpnI/EcoRI fragment of pU3U5 (Cherepanov et aL(1999), Nucleic Acids Research 

20 27: 2202-2210) containing the terminal U3 and U5 regions of the HXB2 HIV-1 LTRs 
was cloned between the Kpnl and EcoRI sites of pUC19 resulting in pUC-LTR. Then 
the Seal site occurring in the ampicilline resistance gene of pUC19 was destroyed by 
partial digestion of pUC-LTR with Seal and insertion of a fragment containing the 
kanamicine resistance gene from the Tn5 transposon yielding pUC-LTR-kan. Finally, 

25 7.5 kb pLTR-IRES-Luc was obtained by cloning the BamHI/PstI fragment of pBIR 
(Martinez-Salas et al. (1993), J. Virol. 67, 3748-3755) carrying the IRES-luciferase 
gene cassette, made blunt with T4 DNA polymerase (Gibco BRL), into the Smal site of 
pUC-LTR-kan. 

30 Cell culture 

HeLa and 293 cells were obtained from American Type Culture Collection. 
HeLa and 293 cells were grown in Dulbecco's modified Eagle's medium (DMEM) 
(GibcoBRL) supplemented with 10 % FCS, 0.12 % (v/w) sodium bicarbonate 
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(GibcoBRL), 2 mM glutamine (GibcoBRL) and 20 jig/ml gentamycin (GibcoBRL) at 
37°C in 5 % COj humidified atmosphere. 293T cells (obtained from Dr. O. Danos, 
Evry, France) express SV40 large T antigen and were grown in DMEM (GibcoBRL) 
with glutamax supplemented with 10% fetal calf serum, 45 U/ml penicillin G (Serva, 
5 Heidelberg, Germany) and 45 fig/ml streptomycin sulphate (Sigma-Aldrich, Bomem, 
Belgium). 

293 and 293T cells were transfected using polyethylenimine (PEI) (Abdallah et 
al. (1997), Hum. Gene Therapy 7:1947-1954). Polyethylenimine Mw ~~ 25.000 was 
from Sigma-Aldrich (Bornem, Belgium). Cells were grown to 50-70 % confluency in 

10 DMEM with glucose, glutamax and 10 % fetal calf serum (FCS) (Gibco BRL). 
Medium was replaced by medium containing 1 % FCS 3 hours before transfection. 
Mixture of DNA and PEI was added to cells in a minimal volume of medium. Next day 
the medium was changed to DMEM containing 25 mM HEPES. Transformation 
efficiency obtained in this way was 50-80 %. HeLa cells were routinely transfected by 

15 electroporation. The cells were first trypsinized at 80 % confluency and pelleted by low 
speed centrifugation. The cells were then resuspended at a density of 2 x 10 6 cells/ml in 
growth medium; 0.5 ml of this solution was aliquoted into 4 mm cuvettes (Eurogentec, 
Seraing, Belgium) and 20 \xg DNA was added to the cell suspension. After the electric 
pulse (10 nF, 250 V), cells were allowed to rest for 10 min at room temperature before 

20 dilution into growth medium.. 

To establish stable cell lines expressing the IN S gene, cells transfected with 
pRSV-IN s or pCMV-IN s were cultured in the presence of 500 ng/ml geneticin (G418) 
or 200 jig/ml hygromicin B (both from GIBCO BRL), respectively. Expression of IN 
was assessed by western blotting and/or indirect immunofluorescence. 

25 

Western blotting and immunofluorescence 

For western blotting and indirect immunofluorescence rabbit polyclonal 
antibodies directed against recombinant His-tagged HIV-1 IN were produced in house 
and were purified using a 1 HiTrap rProteinA column (Pharmacia Biotech, Uppsala, 
30 Sweden) according to established procedures (Ausubel et al. (1987), Current protocols 
in molecular biology, John Wiley & Sons, New York). Western blotting was performed 
using PVDF membranes (Bio-Rad), the ECL+ chemiluminescent detection system 
(Amersham-Pharmacia) and HRP-conjugated goat anti-rabbit antibodies (Bio-rad). 
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Dilutions used were 1 :30000 for the primary antibody and 1 :20000 for the secondary 
antibody. Detection limit was 0.1-0.5 ng of recombinant integrase. Total protein 
concentration was determined on cells lyscd with 1%SDS/1 mM PMSF (Sigma), using 
the BCA protein assay (Pierce, Illinois USA). For western blot analysis 10 jig of total 
5 protein was evaluated. 

For detection of IN expression in situ by indirect immunofluorescence 
microscopy, cells were grown on glass slides (HeLa cells) or in permanox chamber 
slides (GIBCO BRL) (293T cells). After 24-48 hrs, cells were washed with phosphate 

2+ 2+ 

buffered saline (PBS) supplemented with 1 mM Mg and 0.5 mM Ca (PBS+), fixed 
10 in 100% methanol and blocked with 10% foetal calf serum (FCS) in PBS+. Incubations 
with antibodies were carried out at 37°C in blocking solution. The primary antibody 
(rabbit anti-IN) was diluted 1:20 to 1:80; the secondary FITC-conjugated swine anti- 
rabbit antibody from Dako (Glostrup, Denmark) was diluted 1 :40. Nuclear staining was 
performed with 1 ng/ml 4', 6-diamidino-2-phenylindole (DAPI) (Sigma) in methanol. 
15 Fluorescence microscopy was performed with a Leitz microscope (Wetzlar, Germany) 
using filter blocks 12 (FITC) or A (DAPI). 



Detection of integrase activity using a promoterless reporter gene (DIPR) 

293T and 293T-IN S cells were seeded in six-well plates at a density of 10 6 
20 cells/well 24 hr before transfection. Five |ag of DNA was transfected per well using 
PEL 48 hr post-transfection, 5 x 10 5 cells were lysed to determine the luciferase activity 
using the Luciferase Assay System™ (Promega Benelux, Leiden, The Netherlands) and 
the Lumicount™ (Packard, Meriden, CT). The protein concentration of the lysate was 
determined using the Bradford method (Bio-Rad protein assay, Bio-Rad, Hercules, 
25 CA). The relative luciferase activity was calculated by dividing the luminescence 
values by the protein concentration. 

Lentiviral vectors 
Lentiviral vector production 

30 HIV- 1 -derived vector particles, pseudotyped with the envelope of vesicular 

stomatitis virus (VSV), were produced by transfecting 293T cells with a packaging 
plasmid encoding viral gag and pol proteins (pCMVAR8.2), a plasmid encoding the 
envelope of vesicular stomatitis virus (pMDG) and a plasmid encoding a reporter gene 
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flanked by two long terminal repeats (LTRs) (pHR'-CMVLacZ). The first generation 
packaging plasmid, containing all HIV genes except for env y and the transfer vector 
were a kind gift from Dr. O. Danos (Genethon, France). For transfection of a 10 cm 
dish of 293T cells, a 700 ^1 mixture of three plasmids was made in 150 mM NaCl: 20 
5 jig of vector plasmid, 10 ul: of packaging construct and 5 jig of envelop plasmid. To 
this DNA solution 700 yi\ of a PE1 solution (1 10 (il of a 10 mM stock solution in 150 
mM NaCl) was added slowly. After 15 min at room temperature, the DNA-PEI 
complex was added dropwise to the 293T cells in DMEM medium with 1% FCS. After 
overnight incubation, medium was replaced with medium containing 10% FCS. 

10 Supernatants were collected from day two to five post-transfection. The vector particles 
were sedimented by ultraccntrifugation in a swinging-bucket rotor (SW27 Beckman, 
Palo alto, CA) at 25, 000 rpm for 2 hr at 4°C. Pellets were redissolved in PBS resulting 
in a 100-fold concentration. Different viral stocks were normalized based on p24 
antigen content (HIV-1 p 24 Core Profile ELISA, DuPont, Dreieich, Germany) for use 

15 in complementation assays. 

Complementation experiments 

Integrase-defective virus particles were produced using pCMVAR8.2IN(D64V), 
obtained from Dr. D. Trono, (Geneva, Switzerland) as packaging plasmid (Naldini et 

20 aL (1996), Science 272: 263-267). Complemented vectors were produced by expressing 
integrase from pCEP-EN s in 293T cells after quadruple transient transfection. Vector 
preparations were normalized for p24 antigen count Vector was added to target cells in 
the presence of 2 ng/ml polybrene and left overnight. After removal of vector, cells 
were incubated for an additional 36 hrs. Cells were washed with PBS, fixed with 0.75% 

25 formaldehyde/0.05% glutaraldehyde in PBS, and stained with freshly prepared X-gal 
substrate (5 mM potassium ferrocyanide, 5 mM potassium ferricyanide, 2 mM MgCl 2 
and 100 jig/ml 5-bromo-4-chloro-3-indolyl-(3-D-galactopyranoside (x-gal) (Biotech 
Trade & Service Gmbh, St. Leon-Rot, Germany) in PBS) at 37°C overnight. Each 
transduction experiment was done in duplicate in a 96-well plate. Transduction 

30 efficiency was determined by counting the number of blue cells 48 hrs after infection in 
one of the wells, whereas the cells in the duplicate well were splitted 1:2. Half of the 
sample remained in the well and was stained at confluency (passage 1) whereas the 
other half was cultured in a 48-well plate. At confluency, these cells were again splitted 
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1:2. Finally, cells were brought in a 24-well plate and grown to confluency (passage 3, 
dilution 1:8). After staining, the efficiency of stable transduction was measured by 
counting blue colonies. 

5 Tables 

Table 1. Complementation of integrase-defective lentiviral vector particles 

Relative transduction efficiency 1 

10 



Cells Passage WT vector D64V C IN 

vector 



293T 

#0 LOO 0.048 

# 3 1.00 0.007 

293T IN S 

# 0 1.565 0.09 

# 3 1.88 0.045 



0.303 
0.320 

0.510 
0.75 



Transduction efficiency is determined by counting galactosidase-positive cells (# 0) or 
colonies of galactosidase-positive cells (# 3) relative to transduction efficiency obtained 
by WT vector in 293T cells. Results of transduction by WT vector, D64V IN-defective 
J 5 vector and D64V vectors complemented with IN in the producer cells, are shown. Cells 
were infected with normalized amounts of vector. Transduction was done both in 293T 
cells and in 293T cells that are stably expressing IN. Average numbers for two separate 
experiments are shown. 



BNSDOCID. <WO 0065076A2J_> 



WO 00/65076 PCT/EP00/03765 

29 

Table 2. Detection of integrase activity using a promoterless reporter gene (DIPR) 



Luciferase activity (Relative units) 

5 



Experiment 


Cell line 


Blank" 


LTR-IRES- 


LTR-IRES-Luc 








Luc b 


+ pCMV-IN Sc 


A 


293T 


1 


47+ 1 






293T-IN S 


1 


487 ± 119 




B 


293T 


1 


130 ±24 


489 ± 169 




293T-IN S 


1 


499 ± 38 


990 ± 183 



a Relative background luciferase activity in cell lines 

b 293T and 293T-IN S were transfected with equal amounts of linearized pLTR-IRES- 
Luc. under experimental conditions A. In experiment B total DNA concentration was 
10 equalized with parental vector pCEP4. 

C 293T and 293T-IN S were transfected with linearized pLTR-IRES-Luc and 2 of 
pCMV-IN s . 
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CLAIMS 

1. A detection method for intracellular integrase activity using a promoterless reporter 
gene. 

5 

2. The detection method according to claim 1, wherein the reporter gene may be 

i 

luciferase, GFP or an antibiotic selection marker 

3. The detection method according to claim 1 or 2, wherein a reporter gene construct is 
10 generated from the reporter gene and the construct is used as the substrate of the 

enzymatically active retroviral protein expressed from the synthetic gene in accordance : 
with claims 17 to 26 

4. Packaging construct for a lentiviral or complex retroviral vector based on a synthetic 
15 gag or pol gene. 

5. Packaging construct according to claim 4, wherein the synthetic gene is the synthetic 
gene in accordance with any of claims 17 to 26 

20 6. A method of transfecting a eukaryotic cell using the expression vector in accordance 
with any of claims 27 to 29 

7. A eukaryotic cell line harboring the synthetic gene or region of a gene in accordance 
with any of the claims 17 to 26. 

25 

8. The eukaryotic cell line according to claim 7, wherein the retroviral enzymatically 
active protein is expressed using a constitutive, inducable or tissue specific promoter. 

9. The eukaryotic cell line according to claim 7 or 8, wherein the expression is stable. 

30 

10. A transgenic, non-human, animal harboring the synthetic gene or region of a gene in 
accordance with claims 17 to 26. 
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11. The transgenic animal according to claim 10, wherein the expression of the synthetic 
gene or region of a gene is induced by an inducable promoter or by a tissue-specific 
promoter. 

12. The transgenic animal according to claim 10 or 1 1, wherein the animal is a mammal. 



13 A method for preparing a synthetic gene or region of a gene encoding a retroviral 
protein or part of such a protein which is enzymatically active in a target eukaryotic cell, 
comprising the steps of: 
10 1) identifying a group of genes from the total set of genes of the target eukaryotic cell 

which encode proteins which are naturally expressed easily and/or in high concentrations 
in the target cell; 

2) determining the codon sequences of these identified genes and from these sequences a 
preferred codon usage and a preferred nucleotide pair frequency; 
15 3) using the preferred codon usage, identify the non-preferred codons in the natural gene 
encoding the enzymatically active protein; 

4) replacing one or more of the non-preferred codons with one or more preferred codons 
encoding the same amino acids as the replaced codons while biasing the replacement to 
obtain the preferred nucleotide pair frequency. 

20 

14. The method according to claim 13, wherein the replacement step is carried out based 
on a random choice between alternative codons encoding the same amino acid at each 
position using a random number generator and biasing the choice of alternative codons 
based on the preferred codon usage to obtain the preferred nucleotide pair frequency. 
25 15. A method for gene transfer in a eukaryotic cell expressing the synthetic gene or 
region of the gene in accordance with any of the claims 1 7 to 26 

16. A method according to claim 15, wherein the synthetic gene is transiently expressed 
or is stably integrated in said cell. 

30 

1 7. A synthetic retroviral gag or pol gene or a region of a retroviral gag or pol gene for 
the expression of a retroviral gag or pol protein in a eukaryotic cell, the expressed 
retroviral protein being expressed at a level to provide detectable activity of the wild-type 
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function of the expressed retroviral protein in the eukaryotic cell. 

18. The synthetic gene according to claim 17, wherein the retroviral genes have non- 
preferred codons when referred to the eukaryotic cell, the number of non-preferred 

5 codons being such that replacement of all the non-preferred codons by preferred codons 
with respect to the eukaryotic cell results in a GC nucleotide pair content of 65% or 
higher, the synthetic gene having a GC nucleotide pair content of between 53 and 63%, 
more preferably between 55 and 61% and the expressed retroviral protein is expressed at 
a level to provide detectable enzymatic activity of the expressed retroviral protein in the 
10 eukaryotic cell. 

19. The synthetic gene according to claim 17 or 18, wherein the expression of the gag or 
poi proteins is independent of retroviral regulatory proteins. 

15 20. The synthetic gene according to any of the claims 1 7 to 19, wherein the retroviral 
protein is a lentiviral gag or pol protein. 

21. The synthetic gene according to claim 20, wherein the lentiviral protein is an HIV 
gag or pol protein. 

20 

22 The synthetic gene according to any of claims 18 to 2 1, wherein the detectable 
activity of the enzymatic function includes at least promotion or stimulation of the 
integration of DNA fragments into the host cell DNA, preferably the chromosome of the 
host cell. 

25 

23. The synthetic gene according to any of claims 17 to 22, wherein the retroviral protein 
is a protease, reverse transcriptase, integrase protein or a polyprotein gag-pol precursor 
thereof 

30 24. The synthetic gene according to any of the claims 1 7 to 23, wherein the eukaryotic 
cell is a mammalian cell 

25. The synthetic gene according to any of the claims 1 7 to 24, wherein the expression of 
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the protein is at a level of at least 200% of that expressed by the wild type gene in the 
eukaryotic cell. 

26. The synthetic gene according to any of the claims 17 to 25, comprising the sequence 
of Fig. 2A or homologs thereof which have a GC content between 53 and 63%, 
preferably between 55 and 61%. 

27 A eukaryotic expression vector comprising the synthetic gene or region of a gene in 
accordance with any of the claims 1 7 to 26. 

28 The expression vector according to claim 27, fijrther comprising a constitutive or an 
inducible or a tissue-specific promoter. 

29. The expression vector according to claim 27 or 28, comprising a plasmid, a 
mammalian or an insect virus. 
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Pluymers, Wim 

<120> A synthetic gene for expression of a retroviral protein 
with wild type activity m eukaryotic cells 

<130> K1291-PCT 

<140> 
<141> 

<150> EP99201306.0 
<151> 1999-04-26 

<150> EP00200171.7 
<151> 2000-01-18 

<160> 2 

<170> Patentln Ver . 2.1 

<210> 1 
<211> 930 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : synthetic gene 
expressing HIV mtegrase 

<220> 

<221> CDS 

<222> (27) . . (899) 

<220> 

<221> misc_signal 
<222> (24) . . (30) 
<223> Kozak sequence 

<400> 1 

atcactagca acctcaaaca gacacc atg gga ttc ctg gac ggc att gac aag 53 

Met Gly Phe Leu Asp Gly He Asp Lys 

1 5 
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get cag gag gag cac gag aag tac cac teg aat tgg egg gee atg gec 101 
Ala Gin Glu Glu His Glu Lys Tyr His Ser Asn Trp Arg Ala Met Ala 
10 15 20 25 

tec gac ttc aac etg cca ccc gtc gtc get aag gag ate gtt get age 
Ser Asp Phe Asn Leu Pro Pro Val Val Ala Lys Glu He Val Ala Ser 
30 35 40 

tgc gac aag tgc cag ctg aaa ggc gag get atg cac ggg cag gtt gat 
Cys Asp Lys Cys Gin Leu Lys Gly Glu Ala Met His Gly Gin Val Asp 
45 50 55 

tgc tct ccc ggc ate tgg cag etc gac tgt act cac ctg gag ggc aag 
Cys Ser Pro Gly He Trp Gin Leu Asp Cys Thr His Leu Glu Gly Lys 
60 65 70 

gtc ate ctg gtc gee gtg cac gtg gee tct ggt tac ate gag get gag 
Val He Leu Val Ala Val His Val Ala Ser Gly Tyr He Glu Ala Glu 
75 80 . 85 

gtc ate cct gca gag act ggc cag gag act gee tat ttc ctg ctg aaa 341 
Val He. Pro Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe Leu Leu Lys 
90 95 100 105 



ctg gee ggc egg tgg cct gtg aag aca gtg cac aca gat aac ggc tec 
Leu Ala Gly Arg Trp Pro Val Lys Thr Val His Thr Asp Asn Gly Ser 
110 115 120 

aac ttc acc tec acc act gtg aag get gee tgc tgg tgg get ggg ate 
Asn Phe Thr Ser Thr Thr Val Lys Ala Ala Cys Trp Trp Ala Gly He 
125 130 125 

aag cag gag ttc ggg ate ccc tat aac cca cag tct cag ggc gtg ate 
Lys Gin Glu Phe Gly lie Pro Tyr Asn Pro Gin Ser Gin Gly Val He 
140 145 150 

gaa tec atg aac aag gag ctg aag aag ate ate ggc cag gtt egg gac 
Glu Ser Met Asn Lys Glu Leu Lys Lys He He Gly Gin Val Arg Asp 
155 160 165 

cag gca gag cac ctg aag act gca gtg cag atg gee gtg ttc ate cac 
Gin Ala Glu His Leu Lys Thr Ala Val Gin Met Ala Val Phe He His 
170 175 180 185 

aac ttc aag cga aag ggc ggc ate ggt ggc tac tea gee ggc gag egg 
Asn Phe Lys Arg Lys Gly Gly He Gly Gly Tyr Ser Ala Gly Glu Arg 
190 195 200 



149 



197 



245 



293 



389 



437 



485 



533 



581 



629 
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ate gtg gac ate ate gee act gac ate cag ace aaa gag ctg cag aag 677 
lie Val Asp lie lie Ala Thr Asp lie Gin Thr Lys Glu Leu Gin Lys 
205 210 215 

cag ate ace aag ate cag aac ttc cgt gtg tac tac egg gac tec egg 725 
Gin lie Thr Lys lie Gin Asn Phe Arg Val Tyr Tyr Arg Asp Ser Arg 
220 225 230 

gac cct gtg tgg aag ggc cct gee aag ctg ctg tgg aag ggc gag ggc 773 
Asp Pro Val Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly Glu Gly 
235 240 245 

gee gtg gtc att cag gac aac tct gac ate aag gtt gtg ccc agg cgc 821 
Ala Val Val lie Gin Asp Asn Ser Asp lie Lys Val Val Pro Arg Arg 
250 255 260 265 

aag gee aag att ate egg gac tac ggc aag cag atg get ggc gac gac 869 
Lys Ala Lys lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly Asp Asp 
270 275 280 

tgt gtg gee tct cgt caa gat gag gac taa gtccaactac taaactgggg 919 
Cys Val Ala Ser Arg Gin Asp Glu Asp 
285 290 

gatattatga t 930 



<210> 2 
<211> 290 
<212> PRT 

<213> Artificial Sequence 

<223> Description of Artificial Sequence : synthetic gene 
expressing HIV integrase 

<400> 2 

Met Gly Phe Leu Asp Gly lie Asp Lys Ala Gin Glu Glu His Glu Lys 

15 10 15 

Tyr His Ser Asn Trp Arg Ala Met Ala Ser Asp Phe Asn Leu Pro Pro 

20 25 30 

Val Val Ala Lys Glu lie Val Ala Ser Cys Asp Lys Cys Gin Leu Lys 

35 40 45 

Gly Glu Ala Met His Gly Gin Val Asp Cys Ser Pro Gly lie Trp Gin 

50 55 60 

Leu Asp Cys Thr His Leu Glu Gly Lys Val lie Leu Val Ala Val His 
65 70 75 80 

Val Ala Ser Gly Tyr lie Glu Ala Glu Val lie Pro Ala Glu Thr Gly 
85 90 95 
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Gin Glu Thr Ala Tyr Phe Leu Leu Lys Leu Ala Gly Arg Trp Pro Val 

100 105 110 

Lys Thr Val His Thr Asp Asn Gly Ser Asn Phe Thr Ser Thr Thr Val 

115 120 125 

Lys Ala Ala Cys Trp Trp Ala Gly lie Lys Gin Glu Phe Gly lie Pro 

130 135 140 

Tyr Asn Pro Gin Ser Gin Gly V d l lie Glu Ser Met Asn Lys Glu Leu 
145 150 155 160 

Lys Lys lie lie Gly Gin Val Arq Asp Gin Ala Glu His Leu Lys Thr 

165 170 175 

Ala Val Gin Met Ala Val Phe lie His Asn Phe Lys Arg Lys Gly Gly 

180 185 190 

lie Gly Gly Tyr Ser Ala Gly Giu Arg lie Val Asp He He Ala Thr 

195 200 205 

Asp He Gin Thr Lys Glu Leu Gin Lys Gin He Thr Lys He Gin Asn 

210 215 220 

Phe Arg Val Tyr Tyr Arg. Asp Ser Arg Asp Pro Val Trp Lys Gly Pro 
225 230 235 240 

Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala Val Val lie Gin Asp Asn 

245 250 255 

Ser Asp He Lys Val Val Pro Arg Arg Lys Ala Lys He He Arg Asp 

260 265 270 

Tyr Gly Lys Gin Met Ala Gly Asp Asp Cys Val Ala Ser Arg Gin Asp 
275 280 285 

Glu Asp 

290 
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Fig. 3 

1 2 3 

66kD a — 
4tikDa 



30 kDa. 



Lanes: L 2.5ngHT-IN 

2. 293T 

3. 293T-IN 8 
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ate gtg gac *tc ate gec act gac ate cag ace aaa gag ctg ca^ a^g 677 

11* v*J £l=p lie lie Ala Thr Asp 21c Sin Thr Lys Glu Leu Gin Lye 

20b 2L0 215 

cag at-c acc aag ate cag aar; ttc cgt gtg tac tac egg gac tec egg 72S 

He Thr Lys lie Gin Aan Pfie Arg Val Tyjr Tyr Arg A=p Ser Jure? 
22D 225 2 30 

gac .=.±t gtg tgg aag ggc cell act aa^ ct$ ctg tgg *.ag ggr ^9 ggo 773 

Ajsp Pro Val Trp Lys Gly Pro Ala L-ya Leo leu Trp Ly^ Gly Glu Sly 
235 245 

gec gtg gtc *>tt tag qae aac tct gac ate aag gtt gtq ecc agg cgc 32 1 

Ala Val V*X He Gin Asp Asn Scr A*p Xlc Lys Val Val Pro Arg Arg 
25* 255- 260 2S5 

aag qcc aag at*, ate egg gac tac ggc aag cag atg get gge gac gac 36£ 
Ly= Ala Lya lie lie Arg A=p Ty£ Gly Lyt Glr. Met Ala Gly X=p Asp 

270 27* 260 

tgt gtq goc tct cgt c»a. ^at gag gac taa gtc-eaactac taaactgggg Si 9 

Cys Val Ala 5sr Aig Gin Asp Glu Aep 

2:15 290 

gatattatga t 330 



<210> 2 
<2il> ?9'l 
<512> frt 

<213> Artificial sequence 

<22Z> Uea.zriptlt.fi of Artificial Sequence: synthetic gtne 
e::pre.B3inq Hiv irttegraae 

*-400> 2 

Met Gly Phe L-&u Asp Sly He Aap Lys Ala &ln Qiu Glu Kis &lu Ly& 

1 5- 10 15 

Tyr Hie Ser Asn Trp Aro: Ala >5et Ala Ser A&p Phe Asn leu Pto Fr* 

2.0 25 30 

Val Val Ala Lys &lu He Val Ala JScr Cy* A^p Ly = Cy^ Glt> L-eu Lye 

35 40 45 

Gly 61 u Ala Met Ki* OLy Gin Vai Asp Cys Scr Pr* Gly lie Trp Gin 

5CJ 55 60 

Leu Asp Cy* Ttir His leu <5lu c-rly Lys Val He Leu Val Ala va.l Hi- 

65 70 75 30 

Val pAb 9bi Gly Ty* He Glu Al*. Glo Val He Pro Ala Glu Thr Sly 

35 *D E»5 
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get cag q*3 gag c»c gag aag lac cac tc; aat tgg ^99 gci atg gee 101 
Ma Gin Glu Bin Hi= Glu Ly& Tyj H15 3*1 A*n Trp Arg Ala Met Ala 
LP 15 *u 25 

tec g*c ttc sac ccg cca ccc gti? gte qct 9»g ate gtt get age 14* 

Scxr Aap A=n Leu Pro Pre Val Val Ala Ly^ Glu lie Val Ala Ser 

30 35 * D 

tgc g*.= aag tgc ctg aaa gag <?ct atg cac gg? tao gtt gat 

Cya ABp l-y= Cya Gin Leu l.ys Gly Glu Ala Met His Gly &1* Vol A^p 

4*> 50 55 

tge tct ccc 9gc ate tgg. cag cic qae tot act cac ctg ga^ 99* aag 
Cya Ser Pre Gly lie Trp Gar, ILeu A^p Cys Thjr His Lev fclt> Gly Lya 
►SO 65 7D 

gte *U etq qtc gec gtg : cac gtg qcc tct ggt tct ate gag get 253 
vail lie uu Val Ala Val Hi* Val Ali Ser Gly Tyr 11* Ala Glu 

7«, 3D 85 

gte ate ztt gea gag act ggc cag gag aet gec tct ttc ctg ctg aaa 341 
Val lie Pre- Ai* t-lu Tfcr Gly Qin Glu Ali Tyr Pne iau Leu Ly=. 

q 0 *5 1DD ICS 



ctg gee 99c egg tgg cct gtg aag aca gt? sac a" aac ggc 33 P 

Leu Ala Gly Arg Trp Pro val Lys Thr Val Hi^ Thr Asp ABn 51 y Ser 
j ID 115 120 

aac ttc acc- Itc ecc act gte? bb§ get gec tgc tgg t&g get ggg ate 
Aift Phe Thr 5er Thr Thr Val Lys Ala Ala Cy£. Trp Trf Ale Gly lie 
125 126 

**g tag gag ttc Ate ccc tat aac c-ca eag tct cag «?gc gtg ate 

Lye Gltt 01 l, Ptue Gly lie Tyr Abu Pro Gin £<r Gin Gly Val IX* 

\10 145 150 

ga* t« »tg aac aag 9*9 ctg aag 1*9 ate gge cag gtt cg<J ga* 533 

GLu £*r K*t Asn Lys *iu Leu Lys Lys H± He Ely Gin Val Arg Ab P 
Hii ltO 1^5 



^95 



561 



cag gca 5*5 cac ctg aag act gca gtg cag »t9 g-c gtg ttc *tc cac 
Gin Rla Glu Hi* I^u Lys Thr Ala val Gin n«t ^a Val Phe lie Hi5 

a.f- ttc aae cga 09 ggc ggc ate ggt ggc tae tc* gec ggc g*g £59 
Aan eh* I.vb Arg Lys Gly Gly life Gly Gly Tyr seir Ala Gly Glv. Arg 
I90 1^5 
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SE^VBJ.?CE LISTING 

<110> K -U. Leuven. P.« CO rch 4 Development 
D© Clercq r Erik 
Pluyniexs, wim 

C220> A *y*th*fcic gene far expn^.ipa or a retroviral protein 
Kit* wilrf type activity in eukBiryotic cells 

<140> 

<X5Q> EPS32C-1306.0 
<1M> 1999-04-26. 

<1 SO> BP 0 02 DDI 7 1 . 7 
<I5I> 2000-01-1 ft 

<16D> 2 

<17D> ten tin Ver, 2,1 

<210> 1 
<211> 
<212> UtV\ 

<213> Artificial Sequence 

<22D> 

<223> r^fltciption or Artific-iai £equ^c C;3}T itheLic gene 
expressing K1V i/>L^r»?e 

<2ZO> 

<2S1> CDS 

<222> (27) . . r&SiO 

<22Q> 

■:;22 1> mi 5 c_ a i o; a 1 

<:>£:>> (2ai| r . fio] 

<2"23> K* « x a e>q a erice 

<4CK» 1 

at-cactagca a^tcaaaca gAe*cc *tg gga ttc ctg q*r. 5 g C atL w Mg &3 

Met Gly Phe leu. Asp gly lie tys 



1 
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F\%. *. Principle of DIPR 

Detection of integrase activity using a promotsriess reporter gene 



A_ Sabstrate LXMRES-ILiic (digested: witibt Scat) 




(Seal) V 



BL T-ranafection lain ce0s, binding: ofintegraae to U3-U5 ends 
and: cleavage of termini' 



C. Integration into actively transcribed regions of genomic DN A, 
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t3in Glu Thr Ala Tyr J»he Leu Lea Ly= Leu AIa &ly Arg Trp b^r* V*l 

3L00 L0& 110 

LyB Thr Val Hi a Tfrr Asp Asn Sly ssr Fhfe Thr Ser Thr Thr Val 

115 12D 125 

Ly* Ala Cy* Trp Txp Ala Gly lie Lys SIji Glu Fhe Gly lie Pr* 

130 13& 140 

Tyr Aan Pro Gin J5«r t>iri 61 y Val lie Glu ser Ket Asn Ly- Glu Leu 

US ISO 155 160 

Lya- Lys lie lie Ely Gin Val Arg Asp Gin Ala Slu Hi 3 Leu- Lys Thr 

16l> 170 
Ala Val Gin Met VaI Phe lie His Atn Pfr-= Ly* Are: LyB Gly Gly 

1B0 IBS 19D 

lie Gly Sly Tyr Gly Glu JLrg He V*l Asp He 11* Ala Ttir 

>95 200 £05 

5usp lie Sin TJb* Lye Glu L&u Gin Lys Gin lie Thr Lys zl* 51 n Asn 

210 215- 220 

Fh& Arg Val Tyr Tyr Arc A?p S=-tt Arg A=p tfro Va.'i Tip Ly= Gly Pra 
225 22 D 235 240 

Ala Lya Leu Leu Trp Lys Gly O'lu Gly Ala val va! He Gin Acp Abti 

245 250 
Ser A*p He Lya Val Val Pro Arg Arg Lys Ala Lys lie tie Argf Atp 

260 2S5 270 

Tyr Gly Lys 5in Met Ala Gly Asp Asp Cye Val Ala Ser Arg 51 n Asp 
275- 265 

Glu Aep 

2 SO 
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